Stratified Sampling for Association Rules Mining
نویسندگان
چکیده
It is well recognized that mining association rules in a very large database is usually time consuming due to the I/O overhead in scanning the disk resident database. As one of the techniques for reducing the I/O overhead, sampling for mining association rules has been actively investigated during the last few years. Each sampling method and algorithm proposed in the literature has its own merits and demerits in terms of effectiveness and efficiency and none of them can claim to be the best. Which sampling method to use and how big the sample size should be for a given database are key issues in sampling for particular data mining tasks. In this paper a transaction size based stratified sampling method has been proposed, tested and compared with the simple random sampling method for mining association rules. It opens up the questions of how to stratify the datasets so that it can better suit the problem of association rule mining.
منابع مشابه
Model for Load Balancing on Processors in Parallel Mining of Frequent Itemsets
The existence of many large transactions distributed databases with high data schemas, the centralized approach for mining association rules in such databases will not be feasible. Some distributed algorithms have been developed [FDM, CD], but none of them have considered the problem of data skews in distributed mining of association rules. The skewness of datasets reduces the workload balancin...
متن کاملUsing a Data Mining Tool and FP-Growth Algorithm Application for Extraction of the Rules in two Different Dataset (TECHNICAL NOTE)
In this paper, we want to improve association rules in order to be used in recommenders. Recommender systems present a method to create the personalized offers. One of the most important types of recommender systems is the collaborative filtering that deals with data mining in user information and offering them the appropriate item. Among the data mining methods, finding frequent item sets and ...
متن کاملIntroducing an algorithm for use to hide sensitive association rules through perturb technique
Due to the rapid growth of data mining technology, obtaining private data on users through this technology becomes easier. Association Rules Mining is one of the data mining techniques to extract useful patterns in the form of association rules. One of the main problems in applying this technique on databases is the disclosure of sensitive data by endangering security and privacy. Hiding the as...
متن کاملEmploying data mining to explore association rules in drug addicts
Drug addiction is a major social, economic, and hygienic challenge that impacts on all the community and needs serious threat. Available treatments are successful only in short-term unless underlying reasons making individuals prone to the phenomenon are not investigated. Nowadays, there are some treatment centers which have comprehensive information about addicted people. Therefore, given the ...
متن کاملML-DS: A Novel Deterministic Sampling Algorithm for Association Rules Mining
Due to the explosive growth of data in every aspect of our life, data mining algorithms often suffer from scalability issues. One effective way to tackle this problem is to employ sampling techniques. This paper introduces, ML-DS, a novel deterministic sampling algorithm for mining association rules in large datasets. Unlike most algorithms in the literature that use randomness in sampling, our...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005